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Abstract 

The rise of social media provides a great opportunity for 
people to reach out to their social connections to satisfy 
their information needs. However, generic social media 
platforms are not explicitly designed to assist informa¬ 
tion seeking of users. In this paper, we propose a novel 
framework to identify the social connections of a user 
able to satisfy his information needs. The information 
need of a social media user is subjective and personal, 
and we investigate the utility of his social context to 
identify people able to satisfy it. We present questions 
users post on Twitter as instances of information seek¬ 
ing activities in social media. We infer soft community 
memberships of the asker and his social connections by 
integrating network and content information. Drawing 
concepts from the social foci theory, we identify an¬ 
swerers who share communities with the asker w.r.t. the 
question. Our experiments demonstrate that the frame¬ 
work is effective in identifying answerers to social me¬ 
dia questions. 


Information seeking is defined as “A conscious effort to ac¬ 
quire information in response to a need or gap in knowl¬ 
edge” (ICase 20121) . Online social media makes it easier 
for users to reach out to a large number of friends, lead¬ 
ing people to use them to seek information from their so¬ 
cial connections. This gives rise to a distinct way for on¬ 
line information seeking, wherein the information needs ex¬ 
pressed are subjective and personal to the asker. An interest¬ 
ing way people leverage online social media to seek infor¬ 
mation is by asking questions through their status messages 
( |Morris, Teevan, and Panovich 2010) . This phenomenon is 
prevalent in social media platforms like Twitter and Face- 
book and has received considerable attention in recent liter¬ 
ature ( Efron and Winget 2010a[ Paul, Hong, and Chi 2011 
Lampe et al. 2014|[ 


However, unlike dedicated Q&A platforms, generic social 
media sites like Twitter and Facebook are not designed for 
information seeking ( [Paul, Hong, and Chi 201 1) . Questions 
are not archived, thus finding people who answered similar 
questions in the past is difficult. Questions are buried among 
other content produced by the social connections of a poten¬ 
tial answerer. Designing algorithmic frameworks to identify 


Copyright © 2015, Association for the Advancement of Artificial 
Intelligence (www.aaai.org). All rights reserved. 


answerers to social media questions will help to bridge the 
information gap of users and increase user satisfaction. This 
framework can also help enhance Twitter search by making 
it personalized to the asker. 

Information need of social media user is subjective or 
personal, unlike traditional Q&A platforms like Stack- 
overflow, and hi social context is useful to find ap¬ 
propriate people able to satisfy it (IHecht et al. 20121) . 
Also, users with higher tie strength with the asker were 
shown to better satisfy information needs in social media 
( |Panovich, Miller, and Karger 2012| ). For example, to assist 
a person looking to get a new hairstyle, finding people from 
his social connections who share related context with him 
can be more useful to him than finding web pages related to 
hair salons. 

This task faces several challenges. The questions are tex¬ 
tual while the social context of the asker can involve network 
information. Integrating such kind of heterogeneous infor¬ 
mation will help to efficiently utilize social context to iden¬ 
tify answerers to social media questions. Each social media 
user has many social connections and produces a lot of con¬ 
tent leading to significant issues of scalability. Finally, the 
social context of the asker related to the question needs to 
be determined and appropriately utilized in order to identify 
suitable answerers. 

In sociological literature, the social foci theory postulates 
that interactions between people are organized around rele¬ 
vant entities known as foci dFeld 19811 1. A focus can be the 
activities, interests, and various affiliations of a user. Differ¬ 
ent groups of social connections of a user share different foci 
with him. For example, from Fig. |l(a)| we see that the user 
shares an interest of sports with his connections in green, 
an interest of music with his connections in yellow and aca¬ 
demic interests with his connections in red. 

Inspired by the social foci theory we propose that, peo¬ 
ple in social media sharing social foci related to the question 
with the asker are suitable to answer them. Illustrative exam¬ 
ples of questions are given in Fig. |l(b)| The asker of Q1 is 
seeking assistance in his math homework, and this might be 
best responded by users sharing academic foci with him. Q2 
is seeking opinions on an NFL game, and this might be best 
provided by his connections sharing foci related to sports 
with the asker. Similarly, Q3 might be best answered by con¬ 
nections sharing music related foci with the asker. 
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Figure 1: (a) Different foci a user shares with his social con¬ 
nections. (b) Questions of users. Users sharing different foci 
with the asker are more likely to answer related questions. 

In this paper, we propose a framework to investigate the 
utility of social context derived from network and content 
information in identifying answerers to social media ques¬ 
tions. Informed by the concepts of social foci theory illus¬ 
trated in Fig. [T] we utilize social context of an asker related 
to the question, and demonstrate that the framework is ef¬ 
fective in identifying answerers for social media questions. 
Specihcally, we address the following questions: How to uti¬ 
lize the network and content information of the asker and 
his social connections to better identify answerers for social 
media questions? Are approaches based on shared context in 
the question domain useful in identifying answerers to dif¬ 
ferent kinds of social media questions? 

The main contributions of our work are as follows: 

• Formally defining the problem of finding suitable users to 
answer questions in online social media platforms, 

• Proposing a framework to exploit network and content in¬ 
formation to identify answerers to questions, and 

• Conducting experimental evaluations of the framework on 
a dataset of social media questions. 


Questions 

Ql:Who can help me with 
my math homework? 

Q2: Thoughts on the next 
NFL game? Tm undecided 
Q3: Need to update my 
playlist. Any suggestions? 


(b) 


Related Work 

Social media questions have received considerable 
attention in research communities ( Yang et al. 201 1[ 
Efron and Winget 2010b ILee et al. 2012b . An analytical 


study on questions asked and the answers received in 
Twitter is presented in ([ M orris, Teevan, and Panovich 2010t 
Paul, Hong, and Chi 201 l|L They indicated that subjec- 


tive questions were the most prevalent and the trust 
users have on their friends was the primary factor for 
asking questions. A study of questions and responses 
received in Facebook was conducted in ([Gray et al. 2013 


lEllison et al. 20131) and bridging social capital was pro¬ 
posed to be a strong motivation for Q&A activity in social 
media. These works give interesting insights to the question 
answering process in social media, but do not focus on 
identifying answerers to these questions. 

Systems to identify answerers for social media questions 
adopt different methods such as matching question con¬ 
tent with profile information (IHecht et al. 20121 and using 
crowdsourced technology ( Jeong et al. 2013| l. Social search 
architectures and empirical models to route questions to an¬ 
swerers using different kinds of social information are dis¬ 
cussed in dHorowitz and Kamvar 20101 INandi et al. 20131 1. 


These works are meant to demonstrate social search systems 
and hence do not contain any experimental evaluations. 

A related line of research is the study of community 
Q&A systems like Yahoo! Answers (lAdamic et al. 200^ 
and Quora ( |Wang et al. 2013^ . Content from existing 
Q&A sessions are used to rank answerers by NLP tech¬ 
niques. (iJurczyk and Agichtein 2007) uses link structure 
to find authoritative answerers for a question category. 
dZhou et al. 20121) and (Yang et al. 20131 combine network 
and content information to identify authoritative users as an¬ 
swerers. The environment for social media questions is dif¬ 
ferent as the candidate answerers are themselves connected 
via social relations. Systems utilizing question categories 
dZhu et al. 20131 1 cannot be applied as they are not explic¬ 
itly known in generic social media. Social expertise systems 
dPal and Counts 201 H IBozzon et al. 2013T l identify subject 
matter experts in social media. Social media questions are 
subjective and personal might require answerers who share 
social context with the asker rather than subject matter ex¬ 
perts. 

Another related field to our work is the application of 
social foci theory in social media. Social foci theory has 
received attention in several domains such as relational 
learning (Tang and Liu 2009 1 and structural hole theory 
dBurt 20091 1. Recently, social foci theory has been used to 
derive community memberships using both node and edge 
attributes ( |Yang, McAuley, and Leskovec 2013 i. To the best 
of our knowledge, this is the first work that has utilized con¬ 
cepts from social foci theory to identify answerers for social 
media questions. 


Problem Definition 

We first describe some general notations. Boldface upper¬ 
case letters (e.g X) denote matrices and boldface lowercase 
letters (e.g. x) denote vectors. Xy signihes the element in 
the i* row and j* column of matrix X and the i* row of a 
matrix X is denoted by X^. Similarly denotes the i* el¬ 
ement of vector x and Xy denotes a vector x corresponding 
to the quantity y. We denote the Erebonius norm of a matrix 

Xas||X||^^ = ^E.,,X|r 

We now define some terms related to the questions asked, 
the network and content of the asker and his social connec¬ 
tions. We define attributes of a question q as the set of words 
used in the question i.e. Wq = [wqi,Wq 2 ,..., Wqi]. Since we 
are dealing with subjective questions, the asker marking the 
answer to be useful or publicly acknowledging the answerer 
gives the evidence of its acceptance. 

Let A denote the asker of the question q and = 
[fi, f 2 ,....,fm] denote the social connections of A and m is 
the number of social connections of A. We define the egonet- 
work of each asker A as consisting of the asker, the social 
connections of the asker and the links among his social con¬ 
nections. The egonetwork of asker A, N G ]R(™+i)x(™+i) 
is given by 


Ny = 


directed edge from fj to f, i 7^ j, i,j G {A, Ia} 
otherwise 


We collect the status messages of the asker and his social 






















































connections. We apply basic preprocessing steps such as re¬ 
moval of stop words and stemming. We then define the user- 
word matrix S G ] 5 ("*-i-i)x'u; asker A as 

num*tfidfj if user ui has used word wj num times 
0 if user ui has not used the word wj, 

where num is the number of times the user Ui has used the 
word Wj, w is the total number of words used by the asker 
and his social connections and tfidfj is the tf-idf score of 
word Wj. A single user will only use a small subset of the 
total number of words, resulting in S being sparse. 

With the terminologies and the notations described above, 
we formally define the problem as follows “Given a ques¬ 
tion q, an asker A, the network neighborhood of the asker 
f/i, find a suitable set of people among whose responses 
for the question q that the asker accepts”. 

Information Seeking via Social Foci 

In this section, we describe our framework to identify an¬ 
swerers for social media questions in detail. First, we infer 
social foci memberships of the asker and his social connec¬ 
tions from their network and content information. We then 
compute the overlap in foci memberships of the asker and 
his social connections in the question domain to identify an¬ 
swerers to these questions. 

Modeling Content Information 

We model the content information to infer major foci 
of the asker and his social connections. We draw from 
Non-negative Matrix Factorization (NMF) presented in 
( Seung and Lee 2001| l to infer foci from the user-word ma- 
trix S G factorize the matrix S into two low 

dimensional sparse non-negative matrices, U G 
and P G ffi.*" ^ ^ such that k <C m by solving the following 
optimization problem. 

min IIS-UP'^lll- (1) 

U>0,P>0 

Here, k is the number of latent foci in the neighborhood of 
the asker and m is the number of his social connections. U 
denotes the latent foci membership of the asker and his so¬ 
cial connections and P denotes the latent foci memberships 
of words. The correlation between foci memberships of the 
words can be obtained by the overlap in the corresponding 
rows of P. The constraints U > 0 and P > 0 denote that the 
matrices have all non-negative elements. The non-negativity 
ensures an intuitive decomposition of the matrix into its con¬ 
stituent parts. 

Integrating Network Information 

In a social setting, the interests or affiliations of an user 
are correlated with the interests of his social connec¬ 
tions, thereby affecting his memberships to different foci 
(IFeld 198H . This notion is also supported by network homo¬ 
geneity (IMarsden 1988t . which says that people connected 
to each other display similar interests and affiliations. There¬ 
fore, it is essential to utilize network structure to determine 
foci memberships of the asker and his social connections. 


To utilize the network structure, we first factorize the ego 
network of the asker N into two low rank non-negative ma¬ 
trices U G and V G s.t. k <C mby solving 

the following optimization problem. 

min ||N-UVU^|||-, (2) 

U>0,v>0 

where U contains the membership of the asker and his social 
connections to different latent foci and V contains the corre¬ 
lations between the foci. The constraints U > 0 and V > 0 
denote that the matrices have only non-negative elements. 

We then integrate network and content information to in¬ 
fer the foci membership of the asker and his social connec¬ 
tions by formulating the following optimization problem. 

min a||S-UP'^|||.-f/3||N-UVU'^||| (3) 

U>0,V>0,P>0 

^ +7(I|U||| + ||V||| + ||P|||) 

Here U contains the latent foci membership of the asker and 
his connections obtained by integrating network and con¬ 
tent information, P shows the latent foci memberships of 
the words and V represents the correlation between the la¬ 
tent foci. ||U|||., ||V|||., and ||P|||- are regularization terms 
introduced to prevent overfitting and 7 is the positive param¬ 
eter for control the proportions of the regularization terms. 
The constraints U > 0, V > 0, and P > 0 denote that the 
matrices do not contain negative elements, a and (3 are posi¬ 
tive parameters to control the effects of content and network 
proportions respectively. 

We draw from the concepts of the social foci theory illus¬ 
trated in Fig. [T]to propose that users sharing a large amount 
of foci memberships with the asker in the question domain 
can effectively answer social media questions. The shared 
foci memberships of the asker with his social connections 
is given by the overlap between their corresponding rows in 
U. The question domain in the latent foci space is obtained 
by combining the rows of P corresponding to the words in 
the question. Before formalizing these notions, we optimally 
derive the latent matrices U, V and P by solving Eq. @. 

Deriving the Optimal Latent Matrices 

The problem presented in Eq. Q belongs to a class of 
constrained convex minimization problems. Motivated by 
( Ding et al. 2006) 1, we describe an algorithm to find optimal 
solutions for U, V and P. The key idea is to optimize the ob¬ 
jective with respect to one variable while fixing others. The 
three variables are iteratively updated until convergence. 

From Eq.(|2i, we let 

q||S- uphill-h^||N-UVU^|||-h (4) 

7(I|U||| + ||V|||. + ||P|||.) 

We then take the Lagrangian of the objective function J. Let 
the Lagrange multiplier for the constraints U > 0, V > 0, 
and P > 0 be Ay, Ay, and Ap respectively. Then 

C = J + tr(A,U^) -h trjAyV^') -h tr(ApP^) (5) 

We compute the partial derivatives of the lagrangian C with 
respect to U, V, and P keeping the other variables fixed as 












shown below. 
hC 

^ = 2(a(-SP + UP^P) + /3(-N'^UV - NUV'^ 

+ UVU'^UV'^ + UV'^U'^UV) + 7 U) + Au 
c)C 

^ = 2(^(-U'^NU + U'^UVU'^U) + 7 V) + Av 

3 C 

— = 2(a(-S^U + PU^U) + 7 P) + Ap. ( 6 ) 

Substituting the KKT complementary conditions in Eq. (| 6 |l 
and rearranging we get the following update rules for latent 
matrices U, V, and P. 


Ui 


Ui 


qSP + ^(NTUV + NUV^) 
aUP'^P + ^(UVU'^UV^ + UV^U^UV) + 7U 


Algorithm 1; Automatic Identification of 
Answerers to Social Media Questions 

Input: Question q of asker (A), friends and followers 
of A, fA = [fi, f 2 ,fm], Egonetwork of the asker (N), 
user-word matrix of the asker and his connections (S) 
and {a, /3, 7 , k} 

Output : A ranked list of the potential answerers ra 
1: Initialize U, V, P randomly 
2 ; while not convergent do 
3: update 

A- TT ^ TT / qSP+;3(NTuv+NUVT) 

'-‘’■jy aUPlp+^(UVU‘UV‘+UVlU‘UV)+7U 

S' V •<— V / /3UTNU 

->■ V ^ V y y /3(UTUVU‘U)-|-7V 


Vi 


Vi 


/ 3 UTNU 


^(UTUVU^U) + 7V 


aSTU 


qPU‘U + 7P. 


( 7 ) 


The optimization algorithm is summarized in Steps 1-7 in 
Algorithm 1. The square root on the update rules is added 
to ensure convergence (IDing, Li, and Jordan 2008 1 . The cor¬ 
rectness and convergence of the rules can be proved by the 
axillary function method (|Lee and Seung 20^0|. 


Identifying Answerers from Foci Information 

We now identify relevant answerers from the social connec¬ 
tions of the asker using the latent matrices U,V and P. We 
first extract the words from the question attribute vector Wq 
and obtain the foci memberships of each word from the cor¬ 
responding rows in matrix P. We then compute the domain 
of the question in the latent foci space as a combination of 
individual word membership vectors as 

dq = ^ Pi. ( 8 ) 

Wj€Wq 

where dq represents the domain of the question q in the la¬ 
tent foci space and Wi is the word corresponding to the i* 
row of P. 

We next compute the foci memberships of the asker 
and his social connections in the question domain. The 
Hadamard product of two vectors is the pointwise product 
of their respective elements, and it exactly captures this no¬ 
tion. Eor each question, we compute the Hadamard product 
of the row of U corresponding to the asker, Ua and the vec¬ 
tor representing the question domain dq. 

gA = Ua o dq, (9) 

where gA contains the foci membership of the asker in 
the domain of the question. Similarly, we compute the foci 
memberships of each social connection of the asker in the 
domain of the question q by 

Sfm=Ufm°dq, (10) 

where f^ is the m* social connection of the asker, Uf_„ is the 
row of matrix U corresponding to f^ and gf^ contains the 
foci membership of f^ w.r.t the domain of the question. 

Einally, we find the overlap in foci memberships of the 
asker and his social connections in the question domain as 

rs(q, A,fm) = sim(gA,gf„), (11) 


A-p../ p.. / 

o. r-jj v- r-„ y „puTu+7P 

7; end while 

8; Wq = [Wql , Wq2, ..., Wql], dq = 

9 ; gA = Ua o dq, gf^ = Uf„ o dq 
10; rs(q,A,fm) = sim{gA,gfJ 

11 : ra=sort(rs) 


where rs(q, A,fin) denotes the score of the answerer fm to 
the question q by the asker A. We sort the answerers accord¬ 
ing to their score and return them to the asker as a ranked 
list, ra. Results with different similarity metrics is presented 
in Table |2] The method for identifying answerers from foci 
information is summarized in Steps 8-11 in Algorithm 1. 
The quantity rs(q, A,fn,) signifies the context in terms of 
network and content shared between asker A and his social 
connection fm in the domain of question q. 

Time Complexity 

The highest time cost results from updating the latent ma¬ 
trices in steps 4-6. In the updating terms, the complex¬ 
ity of the terms SP and S^U is low due to sparsity of 
S. The terms N^UV, NUV^ and U^NU have a com¬ 
plexity of O(mk^) where m is the number of friends and 
k is the number of latent dimensions due to the sparsity 
of N. The terms (U(V(UTU)VT), (U(V’^(U'^U)V) and 
((U^U)V(U^U)) has a complexity of O(mk^) when com¬ 
puted as shown in the brackets. The complexity of PU^U 
and UP^P is 0 ((wH-m)k^) where w is the number of words. 
Therefore, the overall complexity of a single iteration is 
0 ((wH-m)k^), which is low owing to the few number of la¬ 
tent dimensions. In addition, notice that steps 1-7 can be 
computed offline and only steps 8-10 are computed when 
the question is asked, further reducing the time required to 
identify answerers for a given question. 

Experiments 

In this section, we first present a dataset of questions posted 
on Twitter and then conduct experiments to answer the 
following questions that help in understanding the frame¬ 
work better; How does the proposed framework perform in 
comparison to existing baselines? What is the effect of the 






















Parameter 

Statistics 

# of Questions 

1065 

# of Askers 

1026 

# of Selected Answers 

1450 

# of Followers and Friends of the askers 

966,117 

Median # of Followers and Friends per asker 

588 

Median # Tweets per user 

479 


Table 1; Dataset containing questions posted in Twitter with 
statistics related to network and content information. 


amount of network and content information on the perfor¬ 
mance of the framework? 

Dataset 

The dataset consists of subjective questions from the so¬ 
cial media platform Twitter. We follow the literature on 
questions in Twitter ( [Morris, Teevan, and Panovich 2010) to 
construct a keyword set related to subjective questions. 
We append “7” to each keyword to collect questions from 
the Twitter Streaming API. Texts having “?” in online 
content are shown to be questions with high precision 
( |Cong et al. 2008| l. We deem replies to have been accepted 
by the asker if he has marked it as “favorite” or acknowl¬ 
edged the answerer by using “thanks” or “thank you”. We 
mark the users who provided these answers as the ground 
truth for each question following (IHecht et al. 20121 . Some 
important statistics of the dataset are given in Table [T] The 
hrst question was posted on Dec 27,2013 and the last one on 
Jan 15, 2014. We use the methods in the public Twitter API 
to collect the friends, followers and public status messages 
of the asker to obtain the asker’s social connections and their 
interests ( [Kumar, Morstatter, and Liu 2013| l. We use the data 
to construct the ego network N and user-word matrix S for 
each asker. 


Method 

MRR 

MAP@5 

NDCG@5 

Random 

1 .20% 

1 .12% 

0.25% 

Content-LDA 

1.56% 

1.46% 

0.30% 

Content-STM 

1.93% 

2.27% 

0.50% 

TSPR 

1.64% 

1.63% 

0.45% 

Aardvark 

2 .11% 

2.53% 

0.50% 

Shared Foci (Network) 

3.43% 

3.66% 

0.97% 

Shared Foci (Content) 

3.60% 

3.87% 

1.17% 

Our Model (Cosine) 

3.91% 

4.63% 

1.25% 

Our Model (PCC) 

3.80% 

4.73% 

1.31% 

Our Model (Euclidean) 

4.36% 

5.54% 

1.41% 


Table 2: Comparison of performance of the proposed frame¬ 
work with baselines. 


the answerers derived only from the their content. The inter¬ 
ests were inferred by two topic models: LDA and the Seg¬ 
mented Topic Model (STM) ( |Du, Bun tine, and Jin 20101 1. 

Topic Sensitive Page Rank (IZhou et al. 2012t : This pa¬ 
per employs a PageRank based approach to find subject mat¬ 
ter experts in the question topic by combining network and 
content information of the potential answerers. The paper 
identifies topical authorities not considering the shared con¬ 
text between the asker and the answerers. 

Shared Foci: This baseline measures the effect of shared 
user context. It computes the shared foci memberships of 
the asker and his social connections derived from either net¬ 
work (q;= 0) or content (/3=0) information. The question in¬ 
formation is not taken into consideration. This also helps in 
evaluating methods using only network structure. 

For initial experiments, we set the parameters in Eq. (O as 
follows. The regularization parameter is set at 7=0.01. The 
number of topics in the baselines and the number of foci k is 
set as 50. For initial evaluation of the framework, we choose 
a= 1 and /3= 1. The performance for different values of a and 
(3 will be presented in future subsections. 


Experiment Settings 

We introduce the following metrics to evaluate the per¬ 
formance of our framework: The Mean Reciprocal Rank 
(MRR) (IRadev et al. 2002ll is a measure of the over¬ 
all likelihood of the framework to identify an answerer 
for a question, the Mean Average of Precision (MAP) 
(IBian et al. 2008b measures the potential satisfaction of the 
asker with the top K results and the Normalised Discounted 
Cumulative Gain (NDCG)@K considers the order within 


the top K rankings (Wang et al. 2013b i.We use the follow¬ 
ing baselines to evaluate the performance of our framework. 

Random: We randomly order the friends and followers of 
the asker 100 times and return the mean ordering. 

Aardvark ( [Horowitz and Kamvar 201 Oi l: This paper de¬ 
scribes a search engine which directed questions posted on 
the system to users with formulation to compute affinity with 
the asker and interest in the question topics. It does not con¬ 
sider the network structure and also does not contain exper¬ 
imental evaluations of its formulation. 

Content based Methods (IRiahi et al. 20121 i: The paper 
focuses on community Q&A like Yahoo! Answers and com¬ 
pares the similarity of the question topic with the interests of 


Performance Evaluation 

The results of the evaluations are presented in Table|2] From 
Table |2] we can see that the proposed framework has out¬ 
performed the baselines by a considerable margin. We con¬ 
ducted a paired t-test to compare the performance of our 
framework with that of the baselines, and the results indi¬ 
cated the difference between them is significant. We make 
the following observations from the table. 

The proposed framework gives more than 300% improve¬ 
ment over random selection. We can see that simple formu¬ 
lations like the one in Aardvark that considers social network 
information performs on par with complex topical models 
using only content such as STM. The proposed framework 
also performs significantly better than methods identifying 
subject matter experts as answerers such as TSPR. This em¬ 
phasizes the importance of social context to identify answer¬ 
ers to social media questions. 

Considering shared foci between the asker and the an¬ 
swerer improves the performance over methods like Aard¬ 
vark not utilizing community memberships. This shows the 
effectiveness of using social foci to exploit social context. 
Incorporating question information to consider the overlap 





























Figure 2: Effect of variation of content and network propor¬ 
tions on the framework performance for MAP. 

only in the foci related to the question gives further improve¬ 
ment in the performance. 

In summary, by designing approaches based on shared so¬ 
cial context and exploiting the structure of social ties, the 
proposed framework can effectively identify answerers for 
social media questions in the dataset. Next, we wish to un¬ 
derstand the effect of content and network information on 
the performance of our framework 

Effect of Content and Network Information 

In the model presented in Eq. Q, a and (3 control the pro¬ 
portion of the network and content information respectively. 
In order to evaluate the framework for different propor¬ 
tions of content and network, we set a = [0.1,1, lOland 
P = [0, 0.1,1,10] and plot the values for MAP in Fig.[2]ar- 
bitrarily using cosine similarity as the similarity metric. We 
make the following observations from the figure. 

A general trend in Pig. |2] is a peak at the main diagonal 
of the a and /3 axes and an off-diagonal dip. This shows that 
the framework works best for nearly equal proportions of 
network and content information. The MAP value is greater 
than 3% for all a and /3 except for low proportions of net¬ 
work information (a = 10, P = [0, O.lj). This emphasizes 
the importance of social connections of the asker for identi¬ 
fying answerers to social media questions. The lowest per¬ 
formance across all parameter values is more than twice than 
random ordering indicating the effectiveness of the frame¬ 
work for low relative proportions of content or network in¬ 
formation. Overall, the MAP value is above 3% for different 
combinations of a and P indicating the effectiveness of the 
framework for a wide range of parameter values. 

In summary, the framework performs well over different 
proportions of network and content and is robust to their 
variation. An appropriate combination of network and con¬ 
tent information can optimize the effectiveness of the frame¬ 
work for identifying answerers to social media questions. 

Performance across Question Categories 

Literature on social media questions has identified kinds of 
questions people ask on Twitter. Recommendation, opin¬ 
ions, factual and rhetorical questions are popular ques- 
tions asked on Twitter ([Morris, Teevan, and Panovich 2010] 
Paul, Hong, and Chi 201 l| l. We select four categories related 
to subjective questions, “Suggestions”, “Opinion”, “Eavor”, 


Categories 

Parts 

MRR 

MAP@5 

Suggestions 

39.83% 

4.27%(h-2.23%) 

4.68%(h-1.78%) 

Opinion 

16.42% 

2.67%(h-1.43%) 

2.38%(h-1.61%) 

Favor 

30.51% 

3.65%(h-1.55%) 

4.39%(h-1.01%) 

Rhetorical 

6.74% 

1.75%(h-1.17%) 

0%(H-0%) 


Table 3: Performance for different question categories. 


and “Rhetorical”, and evaluate our framework in identifying 
answerers for different question categories. 

We employed human labelling to assign category labels 
to questions. Three people independently labeled the ques¬ 
tions, and the labels were assigned using majority selection. 
Employing this procedure, 93.5% of the questions were as¬ 
signed to either of the four categories and the framework 
was evaluated on them. The results of the evaluations are 
presented in Table [3 The distribution of different question 
categories is given in the first column. The performance 
for different categories is listed in the other columns. The 
improvement over (IHorowitz and Kamvar 201 Oi l, the near¬ 
est baseline not a part of our method, for different question 
categories is shown in the brackets. 

From the table, we see that the framework gives con¬ 
siderable improvements over all the selected question cat¬ 
egories. A paired t-test suggested that the improvements 
are significant indicating that the framework is effective in 
finding answerers to a wide range of question categories 
in Twitter. The best performance can be seen in “Sug¬ 
gestions” and “Favor” categories and the performance in 
“Opinions” is relatively lower. These results suggest that 
identifying answerers for the “Opinion” category might de¬ 
pend on additional factors such as similarity of views in a 
given topic. The framework gives the lowest performance 
for questions in the “Rhetorical” category. Rhetorical ques¬ 
tions are classified as conversational questions in the litera¬ 
ture ( Harper, Moy, and Konstan 2009) . They might be used 
as an expression of opinion or to initiate a conversation and 
not to express an information need. 


Conclusion and Future Work 

Online social media provides a new platform for people 
seeking information from their social connections. Social 
media questions represent a form of information seeking be¬ 
havior of users. Questions are subjective and personal to the 
asker, and his social context is useful to identify answerers. 
We draw from sociological theories to present a novel frame¬ 
work to infer the shared context between the asker and the 
answerers in the question domain. We evaluate the frame¬ 
work on questions on Twitter and demonstrate its effective¬ 
ness in identifying answerers. The framework is robust to 
a wide range of proportions of network and content infor¬ 
mation and categories of social media questions. The paper 
provides the first framework with experimental evaluations 
to identify answerers to questions in social media. 

Frameworks exist to identify answerers to factual ques¬ 
tions prevalent in community Q&A platforms like Yahoo 
Answers and StackOverflow. Incorporating concepts from 
them will enable us to tackle more diverse questions. Dur¬ 
ing situations like natural disasters, social media users prop- 

















agate requests for help throughout the network. Identifying 
answerers in these situations will require an understanding 
of information propagation and information seeking behav¬ 
ior. Identifying users providing misinformation to questions 
in social media will help to increase the effectiveness of so¬ 
cial media as a quality information source. 
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